42 research outputs found

    Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

    Full text link
    Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.Comment: 16 pages, 3 figure

    Multi-objective Robust Strategy Synthesis for Interval Markov Decision Processes

    Full text link
    Interval Markov decision processes (IMDPs) generalise classical MDPs by having interval-valued transition probabilities. They provide a powerful modelling tool for probabilistic systems with an additional variation or uncertainty that prevents the knowledge of the exact transition probabilities. In this paper, we consider the problem of multi-objective robust strategy synthesis for interval MDPs, where the aim is to find a robust strategy that guarantees the satisfaction of multiple properties at the same time in face of the transition probability uncertainty. We first show that this problem is PSPACE-hard. Then, we provide a value iteration-based decision algorithm to approximate the Pareto set of achievable points. We finally demonstrate the practical effectiveness of our proposed approaches by applying them on several case studies using a prototypical tool.Comment: This article is a full version of a paper accepted to the Conference on Quantitative Evaluation of SysTems (QEST) 201

    Active Learning in Persistent Surveillance UAV Missions

    Get PDF
    The performance of many complex UAV decision-making problems can be extremely sensitive to small errors in the model parameters. One way of mitigating this sensitivity is by designing algorithms that more effectively learn the model throughout the course of a mission. This paper addresses this important problem by considering model uncertainty in a multi-agent Markov Decision Process (MDP) and using an active learning approach to quickly learn transition model parameters. We build on previous research that allowed UAVs to passively update model parameter estimates by incorporating new state transition observations. In this work, however, the UAVs choose to actively reduce the uncertainty in their model parameters by taking exploratory and informative actions. These actions result in a faster adaptation and, by explicitly accounting for UAV fuel dynamics, also mitigates the risk of the exploration. This paper compares the nominal, passive learning approach against two methods for incorporating active learning into the MDP framework: (1) All state transitions are rewarded equally, and (2) State transition rewards are weighted according to the expected resulting reduction in the variance of the model parameter. In both cases, agent behaviors emerge that enable faster convergence of the uncertain model parameters to their true values

    An iterative decision-making scheme for Markov decision processes and its application to self-adaptive systems

    Get PDF
    Software is often governed by and thus adapts to phenomena that occur at runtime. Unlike traditional decision problems, where a decision-making model is determined for reasoning, the adaptation logic of such software is concerned with empirical data and is subject to practical constraints. We present an Iterative Decision-Making Scheme (IDMS) that infers both point and interval estimates for the undetermined transition probabilities in a Markov Decision Process (MDP) based on sampled data, and iteratively computes a confidently optimal scheduler from a given finite subset of schedulers. The most important feature of IDMS is the flexibility for adjusting the criterion of confident optimality and the sample size within the iteration, leading to a tradeoff between accuracy, data usage and computational overhead. We apply IDMS to an existing self-adaptation framework Rainbow and conduct a case study using a Rainbow system to demonstrate the flexibility of IDMS

    Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

    Get PDF
    Planning for multi-agent systems such as task assignment for teams of limited-fuel unmanned aerial vehicles (UAVs) is challenging due to uncertainties in the assumed models and the very large size of the planning space. Researchers have developed fast cooperative planners based on simple models (e.g., linear and deterministic dynamics), yet inaccuracies in assumed models will impact the resulting performance. Learning techniques are capable of adapting the model and providing better policies asymptotically compared to cooperative planners, yet they often violate the safety conditions of the system due to their exploratory nature. Moreover they frequently require an impractically large number of interactions to perform well. This paper introduces the intelligent Cooperative Control Architecture (iCCA) as a framework for combining cooperative planners and reinforcement learning techniques. iCCA improves the policy of the cooperative planner, while reduces the risk and sample complexity of the learner. Empirical results in gridworld and task assignment for fuel-limited UAV domains with problem sizes up to 9 billion state-action pairs verify the advantage of iCCA over pure learning and planning strategies

    Feedback Control of the National Airspace System

    Get PDF
    This paper proposes a general modeling framework adapted to the feedback control of traffic flows in Eulerian models of the National Airspace System. It is shown that the problems of scheduling and routing aircraft flows in the National Airspace System can be posed as the control of a network of queues with load-dependent service rates. Focus can then shift to developing techniques to ensure that the aircraft queues in each airspace sector, which are an indicator of the air traffic controller workloads, are kept small. This paper uses the proposed framework to develop control laws that help prepare the National Airspace System for fast recovery from a weather event, given a probabilistic forecast of capacities. In particular, the model includes the management of airport arrivals and departures subject to runway capacity constraints, which are highly sensitive to weather disruptions.National Science Foundation (U.S.) (Contract ECCS-0745237)United States. National Aeronautics and Space Administration (Contract NNA06CN24A

    Multi-Stage Resource Allocation Under Uncertainty

    No full text
    In this paper, we discuss a strategic planning problem of allocating resources to groups of tasks organized in successive stages. Each stage is characterized by a set of survival rates whose value is imprecisely known. The goal is to allocate the resources to the tasks (i.e. to form 'teams') by dynamically re-organizing the teams at each stage, while minimizing a cost objective over the whole stage horizon. A modelling framework is proposed, based on linear programming with adjustable variables. The resulting 'uncertain linear program' is subsequently solved using the sampled scenarios randomized technique

    Randomized algorithms for probabilistic aircraft conflict detection

    No full text
    mid-range conflict alerting system is proposed, based on a measure of criticality which directly takes into account the uncertainty in the prediction of the aircraft positions. The use of randomized algorithms makes the computation of the criticality measure tractable. The performance of the algorithm is evaluated by Monte Carlo simulation on a stochastic ODE model of the aircraft motion

    Experimental Demonstration of Adaptive MDP-Based Planning with Model Uncertainty

    No full text
    Markov decision processes (MDPs) are a natural framework for solving multiagent planning problems since they can model stochastic system dynamics and interdependencies between agents. In these approaches, accurate modeling of the system in question is important, since mismodeling may lead to severely degraded performance (i.e. loss of vehicles). Furthermore, in many problems of interest, it may be difficult or impossible to obtain an accurate model before the system begins operating; rather, the model must be estimated online. Therefore, an adaptation mechanism that can estimate the system model and adjust the system control policy online can improve performance over a static (off-line) approach. This paper presents an MDP formulation of a multi-agent persistent surveillance problem and shows, in simulation, the importance of accurate modeling of the system. An adaptation mechanism, consisting of a Bayesian model estimator and a continuouslyrunning MDP solver, is then discussed. Finally, we present hardware flight results from the MIT RAVEN testbed that clearly demonstrate the performance benefits of this adaptive approach in the persistent surveillance problem. I

    Bounded parameter Markov decision processes with average reward criterion

    No full text
    Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we provide results for average reward BMDPs. We establish a fundamental relationship between the discounted and the average reward problems, prove the existence of Blackwell optimal policies and, for both notions of optimality, derive algorithms that converge to the optimal value function
    corecore